01. Video: Introduction

Introduction to Multiple Linear Regression

00:00
00:00

In this lesson, you will be extending your knowledge of simple linear regression, where you were predicting a quantitative response variable using a quantitative explanatory variable. That is, you were using an equation that looked like this:

y^=b0+b1x1\hat{y} = b_0 + b_1x_1

In this lesson, you will learn about multiple linear regression. In these cases, you will be using both quantitative and categorical x-variables to predict a quantitative response. That is, you will be creating equations that like this to predict your response:

y^=b0+b1x1+b2x2+b3x3+b4x4\hat{y} = b_0 + b_1x_1 + b_2x_2 + b_3x_3 + b_4x_4

Furthermore, you will learn about how to assess problems that can happen in multiple linear regression, how to address these problems, and how to assess how well your model is performing. It turns out Rsquared can be used, but might be misleading. And, unfortunately, the correlation coefficient is only a measure of the linear relationship between two quantitative variables, so it will not be very useful in the multiple linear regression case.

Here is a wonderful free supplementary book: Introduction to Statistical Learning. This is an absolutely spectacular book for getting started with machine learning, and Chapter 3 discusses many of the ideas in this lesson. The programming performed in the text is in R, but here is an additional resource, not created by the book's authors, that provides Jupyter Notebooks in Python with notes and answers to nearly all the questions from the book: https://www.reddit.com/r/learnpython/comments/6rkh3u/introduction_to_statistical_learning_with_python/